Pandas Series is a table with 1 column , row index and a name.
Row Name/Column Name | Name 1 | Name 2 | Name 3 |
---|---|---|---|
0 | V1 | V2 | v3 |
1 | V1 | V2 | v3 |
1234lis = [1,2,3,4,5,6]
lis = [1,2,3,4,5,6]
s= pd.Series(lis,name='A',index = ['Zero','One','Two','Three','Four','Five'])
df = pd.DataFrame(lis,columns=['A'],index = ['Zero','One','Two','Three','Four','Five'])
>Usually we work on dataset which we convert to dataframes to perform analysis on. Usually files as such can be csv, excel, text and we need to make sense of these files.
12o = pd.read_csv('olympics.csv')
o.head()
We can see that we have unwanted index and columns and what we actually need are the 1 row as column names and 1st column as index.
We can do by takeing advantage of the **read_csv()**
12o = pd.read_csv('olympics.csv',skiprows=1,index_col=0)
o.head()
> Gives much better result but not entierly. We can still se that some column names do not make sense or can ambiguous. Lets do a little more formatting
123456for col in o.columns:
if col[:2] == '01': o.rename(columns={col:'Gold'+col[5:]},inplace=True)
if col[:2] == '02': o.rename(columns={col:'Silver'+col[5:]},inplace=True)
if col[:2] == '03': o.rename(columns={col:'Bronze'+col[5:]},inplace=True)
if col[:1] == '№': o.rename(columns={col:'#'+col[2:]},inplace=True)
o.head()
R/C | #Summer | Gold | Silver | Bronze | Total | #Winter | Gold1 | Silver1 | Bronze1 | Total.1 | #Games | Gold2 | Silver2 | Bronze2 | Combined total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Afghanistan | (AFG) | 13 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 22 |
Algeria (ALG) | 12 | 5 | 2 | 8 | 15 | 3 | 0 | 0 | 0 | 0 | 15 | 5 | 2 | 8 | 15 |
Argentina (ARG) | 23 | 18 | 24 | 28 | 70 | 18 | 0 | 0 | 0 | 0 | 41 | 18 | 24 | 28 | 70 |
Armenia (ARM) | 5 | 1 | 2 | 9 | 12 | 6 | 0 | 0 | 0 | 0 | 11 | 1 | 2 | 9 | 12 |
Australasia (ANZ) [ANZ] | 2 | 3 | 4 | 5 | 12 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 4 | 5 | 12 |
This is much better and more understandable. Now we can use it for futher analysis purposes.
1o['Silver'] >= 5
123456Afghanistan (AFG) False
Algeria (ALG) False
Argentina (ARG) True
Armenia (ARM) False
Australasia (ANZ) [ANZ] False
Name: Silver, dtype: bool
For example above expression will give us all the countries who have won 5 or more Silvers
The expression is broadcasted to all the values in o['Silver]
series and returns a boolean output.